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(S) Computer for Simultaneously executing plural instructions. 



(g) A computer for simultaneously executing plural instructions. A decision means determines the types 
of operation and the possibility of simultaneous execution for the plural instructions when they are read 
out from the main memory (1) to the cache memory (3). The result of this determination is called a 
\ decision result The plural instructions and decision result are stored in the cache memory (3). The 

decision process is performed for several subsets of the plural instructions read out from the main 
memory (1) to the cache memory (3) in order. 

Then, the plural instructions are respectively assigned to corresponding operation units according to 
the decision result and executed. As a result of this arrangement, the repeated decision process for the 
plural instructions need not be repeated each time the instructions are read out from the cache memory 
(3) to the operation unit 
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four instructions is decided when they are read out 
from the main memory 1 to the cache memory 3 
according to the present invention. The invention 
includes a decision means, which is embodied as an 
instruction contention analysis section 1 1 . Instruction 
contention analysis section 11 determines the type of 
operation for the four instructions by referring to their 
operation codes and decides corresponding oper- 
ation units for the four instructions respectively. Then 
the section 1 1 determines whether the types of oper- 
ations are mutually different or not. If at least two of 
the types of operations are the same, the section 1 1 
adds a flag to the corresponding instruction. The flag 
indicates that the corresponding instruction is not to 
be executed during the same clock cycle. The instruc- 
tion contention analysis section 1 1 examines whether 
of the same operation units corresponds to at least 
two of four instructions or not. If; for example, two 
instructions requiring the same operation unit are 
detected, the section 11 sets a flag of the second 
instruction that corresponds to the first instruction to 
"1/ The flag belongs to the lower priority instruction 
of the second instruction and requires that the second 
instruction wait to be executed by the operation unit. 
The priority of instruction is defined by an address of 
the four instructions. The larger the address of instruc- 
tion, the lower the priority of the instruction. Accord- 
ingly, the instruction corresponding to the first priority 
among the four instructions always has a flag value of 
■0". (In some embodiments, the flag bit corresponding 
to the first priority is omitted), in short three instruc- 
tions corresponding to the second, third, fourth priority 
include a flag bit for indicating the contention of the 
operation unit 

Figure 2 shows a format of a decision result out- 
put from the instruction contention analysis section 
1 1. The format includes three bits for each instruction 
to indicate the type of operation for the instruction and 
one bit for each instruction to indicate the contention 
of the operation unit for one instruction (except for the 
first priority instruction). As shown in Figure 2, a pre- 
ferred embodiment includes seven types of oper- 
ations. Therefore, three bits are necessary to indicate 
the type of operation. If an instruction must wait for 
executing because of overlap of the operation unit 
the bit indicating the contention of the operation unit 
is set to The format has fifteen bits in total, twelve 
bits of which respectively indicate the type of oper- 
ation for four instructions (#0, #1. #2, #3) and three 
bits of which (MARK) respectively indicate the conten- 
tion of the operation u nit for three instructions (#1 , #2, 
#3). There is no flag shown for instruction #0. Four 
instructions and the decision result (fifteen bits in 
total) are stored in the cache memory 3. The above- 
mentioned decision process is performed for several 
subsets of four instructions in the main memory 1 
whenever they are read out from the main memory to 
the cache memory in order. 



* Then, four instructions and the decision result are 
read out from the cache memory 3 to an instruction 
buffer 12. The distribution matrix 10 (assignment 
means) reads out the four instructions with the deci- 
5 sion result from the instruction buffer 12. The distribu- 
tion matrix 10 supplies the four instructions to a 
corresponding operation unit 4-9 according to the 
decision result At this moment the distribution matrix 
10 refers to a program counter (not shown in the Fig- 
10 ure). The program counter indicates the address of a 
previously executed final instruction. The distribution 
matrix 10 confirms the end of execution of all four pre- 
vious instructions, referring to the program counter. In 
applicant's invention, it is not necessary for four 
is instructions a type of operation and contention to be 
decided when the instructions are read out from the 
cache memory to the operation unit The distribution 
matrix 10 is able to immediately supply the four 
instructions to the operation unit according to the deci- 

20 sion result 

After the four instructions are supplied, the oper- 
ation units 4-9 read out the date from the register that 
is necessary for executing the instruction. (The regi- 
ster section 13 consists of nine registers in this 
25 example.) (See Fig. 6) At this time, th : .aeration unit 
analyzes the register number to which the result of the 
operation is to be written according to the instruction 
and sets a flag corresponding to the register number 
in the score board 14 to "1" according to the type of 
30 operation (see Fig. 6). The score board 14 stores 
information that indicates whether or not the register 
is monopolized by the operation unit for several 
cycles. If the value of a register for executing the 
instruction is detected for the operation unit to be use- 
35 less (monopolized by the other operation unit) accord- 
ing to the score board 1 4, the operation unit abandons 
the executed result and executes the same instruction 
again for next cycle. 

Figure 3 shows a construction of the instruction 
40 contention analysis section 1 1 (decision means). As 
shown in Figure 3, the instruction contention analysis 
section 11 includes the read-out buffer 31, the four 
decoders 32, 33, 34, 35 and a decision result gener- 
ation section 36. The read-out buffer 31 temporarily 
45 stores four instructions read-out from the main mem- 
ory 1. The four instructions are respectively transfer- 
red from the read-out buffer 31 to the cache memory 
3 and the four decoders 32, 33, 34. 35. The first (high- 
est) priority instruction is transferred to the decoder 
so 32. The fourth (lowest) priority instruction is transfer- 
red to the decoder 35. The decoders 32, 33, 34, 35 
analyze the type of operation unit required to execute 
the instruction and the decoders 33, 34, 35 examine 
whether the same operation unit is required by the 
55 higher priority instruction or not The three decoders 
33 34 35 respectively receive the decode result of 
higher priority decoder* 32, 33, 34 in order and decide 
whether the received decode result indicates conten- 
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The present invention relates to a computer hav- 
ing a plurality of types of operation units, where the 
plurality of operation units can execute instructions 
simultaneously. 

Description of the Background 

In conventional computer systems, the computer 
reads out instructions one by one from a main memory 
and executes the instructions in order. Even though a 
conventional computer has a plurality of types of oper- 
ation units, only one operation unit executes one 
instruction during any one dock cycle. Thus, the pro- 
cessing speed for conventional computers is low. 

Recently, the plurality of types of operation units 
in the computer have been put ot practical use. The 
computer reads out plural instructions from the main 
memory and executes the plurality of instructions sim- 
ultaneously. A representative method is called "Super 
Scalar/ In this method, the computer includes a 
cache memory. First, a subset of instructions stored 
in main memory are read out from the main memory 
to the chache memory. The number of the instructions 
to be executed simultaneously is fixed in advance. 
Then, the subset of the instructions are read out from 
the cache memory. The instructions are examined to 
determine the type of the operation unit required by 
each instruction, and the possibility of simultaneous 
execution. Ones of the subset of instructions are re- 
spectively supplied to corresponding operation units 
according to the decision result and the operation 
units executes the corresponding instructions. The 
decision process is performed for several subsets of 
instructions read out from the cache memory in order. 

However, in the "Super Scalar* method, a deci- 
sion is made for the subsets of instructions when the 
instructions are read out from the cache memory to 
the operation unit The cache memory stores instruc- 
tions that are used repeatedly. Therefore, the system 
repeatedly performs the deciding step whenever the 
instructions are read out from the cache memory. In 
short, the cycle time for executing the instruction 
increases because of the repeated decision process. 

SUMMARY OF THE INVENTION 

Accordingly, it is one object of the present inven- 
tion to provide a computer for simultaneously execut- 
ing piuraJ instructions, where the cycle time for 
executing the instruction of the computer does not 
increase. \ 

It is another object of the present invention to pro- 
vide a computer for simultaneously executing plural 
instructions, where the computer does not decide the 
possibility of simultaneous execution for the plural 
instructions when the instructions are read out from 
the cache memory to the operation unit 

These and other objects of the present invention 
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are accomplished by deciding whether simultaneous 
execution is possible for the plural instructions when 
they are read out from the main memory to the cache 
memory. The plural instructions and decision result 

5 are stored in the cache memory. The decision process 
is performed for several subsets of the plural instruc- 
tions read out from the main memory in order. Then, 
the plural instructions are immediately assigned to 
corresponding operation units according to the deci- 

w sion result As a result of this arrangement the 
repeated decision process for plural instructions is not 
necessary when the plural instructions are read out 
from the cache memory to the operation unit 

15 BRIEF DESCRIPTION OF THE DRAWINGS 



Figure 1 shows a block diagram of a computer for 
simultaneously executing plural instructions 
according to the present invention; 
20 Figure 2 shows a format of the plural instructions 

and the decision result according to the present 
invention; 

Figure 3 shows a construction of a instruction 
contention analysis section according to the pre- 

25 sent invention; 

Figure 4 shows a construction of a distribution 
matrix according to the present invention; 
Figure 5 shows a construction of a score board 
stored in a memory according to the present 

30 invention; and 

Figure 6 shows an example of a score board for 
simultaneously executing the plural instructions 
according to the present invention. 

35 DESCRIPTION OF PREFERRED EMBODIMENTS 

Figure 1 shows a block diagram of a computer for 
simultaneously executing plural instructions. A main 
memory 1 stores a large number of the instructions in 

-to predetermined order. A subset of the instructions are 
read out from the main memory 1 to a cache memory 
3 inside a processor 2 and the instructions are re- 
spectively supplied to an operation unit 4-9 where 
they are executed. Figure 1 shows an embodiment of 

45 the present invention including a floating-adder (F- 
ADD) 4, a floating-multiplier (F-MUL) 5, two integer- 
arithmetic logical units (I -ALU) 6 and 7, a 
branch-control unit 8 and an exception process unit 
(EXCEPTION) 9. A distribution matrix 10 supplies the 

so plural instructions to the operation units 4-9 according 
to the type of operation performed by the instructions. 

The subset of plural instructions are read out from 
the main memory 1 to the cache memory 3 at the begi- 
nning. 

55 For example, several subsets of four instructions, 

which are to be repeatedly used, are read out from the 
main memory 1 in order. 

The possibility of simultaneous execution for the 



3 



7 



EP 0 449 661 A2 



3 



16) are executed simultaneously at clock cycle 7. 

The described embodiment contains a subset of 
four instructions. Other embodiments may contain a 
different number of instructions in a subset. 



Claims 

1. A computer for simultaneously executing plural 
instructions, comprising plural types of operation 
units (4, 5 ? 6, 7. 8, 9) each for executing one of a 
plurality of corresponding types of instructions, 
and main memory (1) for pre-storing a plurality of 
instructions, characterized in that: 

decision means (1 1) for deciding the types 
of operation unit for each instruction of a subset 
of the plural instructions according to the types of 
the subset of the plural instructions and for deci- 
ding a possibility of simultaneous execution of the 
subset of the plural instructions when the subset 
of the plural instructions are read from the main 
memory (1), wherein the decision of the decision 
means is called a decision result; cache memory 
(3) for temporarily storing the subset of the plural 
instructions and the decision result; and assign- 
ment means (10) for respectively assigning ones 
of the subset of the plural instructions to corre- 
sponding types of the operation units according to 
the decision stored in the cache memory (3). 

2. The computer for simultaneously executing plural 
instructions according to claim 1, wherein the 
decision means (11) determines that the plural 
instructions are to be simultaneously executed 
when the type of operations are mutually diffe- 
rent. 

3. The computer for simultaneously executing plural 
instructions according to claim 1, wherein the 
decision means (11) determines that the plural 
instructions are not to be simultaneously exec- 
uted when at least two of the types of operations 
are the same. 

4. The computer for simultaneously executing plural' 
instructions according to claim 3. wherein the 
decision means determines that the plural 
instructions are to be simultaneously executed in 
case that number of the plural instructions whose 
types of operations are the same is less than 
number of the operation units whose type corres- 
ponds to the type of the plural instructions. 

5. The computer for simultaneously executing plural 
instructions according to claim 1, wherein the 
assignment means (10) is a distribution matrix for 
supplying an instruction to a corresponding oper- 
ation unit according to the type of the operation 



unit and the possibility of simultaneous execution. 

6. The computer for simultaneously executing plural 
instructions according to claim 1, wherein the 
5 plural instructions are repeatedly assigned to the 

corresponding operation units (4, 5, 6, 7, 8) by the 
assignment means (10) and executed by the cor- 
responding operation units. 

io 7. The computer for simultaneously executing plural 
instructions according to claim 1 , further compris- 
ing a plurality of registers (13), where information 
in the plurality of registers is accessed by the 
operation units during execu* ; on. 

15 

8. The computerforsimultaneously executing plural 
instructions according to claim 7, wherein execu- 
tion of a multiple clock cycle instruction by an 
operation unit causes at least one register to be 

20 monopolized for more than one clock cycle, 

further comprising a score board (14), which has 
flags corresponding to respective ones of the 
registers for indicating whether the registers are 
monopolized by the execution of an instruction or 

25 not 

9. The computer for simultaneously executing plural 
instructions according to ciaim 8, wherein one of 
the flags in the score board (14) is set, conre- 

30 sponding to a monopolized register, according to 

a priority of the plural instructions. 

10. The computer for simultaneously executing plural 
instructions according to claim 9, wherein an 

35 executed result is produced by a multiple clock 

cycle instruction and wherein the operation unit 
decides to write the executed result in the mono- 
polized register or abandon the executed result, 
according to the flag corresponding to the mono- 

40 polized register in the score board (14). 

11. A computer for simultaneously executing plural 
instructions, comprising plural types of operation 
units (4, 5, 6. 7. 8, 9). each for executing one of 

45 a plurality of corresponding types of instructions, 

and main memory (1) for pre-storing a plurality of 
instructions, characterized in that 

decision means (11) for deciding the type 
of operation unit for each instruction of a subset 

so of the plural instructions according to the types of 

the subset of the plural instructions and for deci- 
ding a possibility of simultaneous execution of the 
subset of the plural instructions, which are to be 
repeatedly executed, whenever several subsets 

55 of the plural instructions are read out from the 

main memory (1 ) in order, wherein the decision of 
the decision means is called a decision result; 
cache memory (3) for temporarfly storing 



6 



5 



EP 0 449 661 A2 



6 



tion of the operation unit or not. If the type of operation 
unit of the decoder is the same as the type of the oper- 
ation unit of the higher priority instruction, the decoder 
sends a mark bit of "1". The decision result generation 
section 36 receives the type of operation unit and the 
mark bit indicating whether there is contention for the 
operation unit (mark bit) from the four decoders 32, 
33, 34, 35 or not The decision result generating sec- 
tion 36 generates the decision result as shown in Fig- 
ure 2. Then the decision result corresponding to the 
four instructions is stored in the cache memory 3. 

Figure 4 shows construction of the distribution 
matrix 10 (assignment means). The matrix 10 
includes four decoders 41, 42. 43, 44 and a switch 
matrix 45. Each decoder receives a type of operation 
unit and a contention mark from the instruction buffer 
1 2. The decoder 41 (highest priority) receives only the 
type of operation unit. If the contention mark is "0", the 
decoder opens a gate of a switch matrix 45 corre- 
sponding to the type of operation unit (The decoder 
41 opens the gate unconditionally.) If the contention 
mark is "I", the decoder does not open the gate cor- 
responding to the type of operation unit and opens the 
gate at a next cycie. in short, the gates of switch mat- 
rix 45 are opened by the execution timing according 
to the type of operation unit and the contention mark. 
Therefore, the plural instructions are effectively sup- 
plied to a corresponding operation unit without excess 
overhead time. 

Figure 5 shows a construction of the score board 
14. The score board 14 includes a score board regis- 
ter 51 and four judgement circuits 52 t 53, 54, 55, 
which are cyclically connected by corresponding bits. 
(A preferred embodiment has nine registers in register 
section 13.) The score board register 51 consists of 
bits corresponding to the registers in the register sec- 
tion 13, respectively. At the beginning, the score 
board 51 stores information concerning whether the 
previous four instructions had completed executing or 
not If ail bits of the score board register 51 are "0*, the 
previous four instructions completed executing. If at 
least one bit of the score board register 51 is '1', the 
previous instruction was notfinished executing. When 
all bits of the score board register 51 are *0", the four 
judgement circuits begin processing. The judgement 
circuit 52 receives the first priority instruction through 
the operation unit and detects the register number for 
writing the operation result where the operation 
requires plural clock cycles. The judgement circuit 52 
writes "\ 9 in the bit corresponding to the register num- 
ber if the operation requires pluraJ clock cycles and 
sends the completion signal to the corresponding 
operation unit The judgement circuit 53 receives the 
second priority instruction through the operation unit 
and detects the register number for writing the oper- 
ation result where the operation requires plural clock 
cycles and register number for reading data which is 
necessary for execution. The judgement circuit 53 



receives the bit of the score board register 51 includ- 
ing the judgement result of the judgement circuit 52. 
If the bit corresponding to the register number for 
reading is set to "1", the judgement circuit 53 sends 

5 the wait signal to the corresponding operation unit If 
the bit corresponding to the register number for read- 
ing is reset to •0", the judgement circuit 53 sends the 
completion signal to the corresponding operation unit 
The judgement circuit 53 writes "1" in the bit corre- 

w sponding to the register number for writing if the oper- 
ation requires plural clock cydes. The judgement 
circuit 54 (corresponding to the third priority instruc- 
tion) and the judgement circuit 55 (corresponding to 
the fourth priority instruction) perform the same pro- 

15 cessing as the judgement circuit 53. (The bit "1 " of the 
score board register 51 is reset to "0 s when the oper- 
ation unit writes the operation result in the corre- 
sponding register.) 

Figure 6 shows an example of the score board 

20 register 14 over time for simultaneously executing 
plural instructions. This example assumes that two 
instructions are simultaneously executed. As shown 
in an upper portion of Figure 6, three groups of two 
instructions (11, 12) (13, 14) (15, 16) are executed in 

25 order. 

When first two instructions (11, 12) are executed at 
clock cycle 1, all bits of the score board register are 
reset to "0". The read register (ri , r5) of the instruction 
(12) is not different from the write register (r3) of the 

30 instruction (11). Therefore, two instructions (11 , 12) are 
able to be executed simultaneously at clock cycle 1. 
However, the instruction (11) is an add operation, 
which requires one clock cycle, and the instruction 
(12) is a multiple operation, which requires three clock 

35 cycles. Therefore, the bit corresponding to the regis- 
ter (r4) f which is a write register of the instruction (12), 
is set to "1 When the next two instructions (13, 14) are 
executed at clock cycle 2, the first instruction (13) is 
immediately executed because the read register (r3, 

40 r5) is reset to "0". But the bit corresponding to the regi- 
ster (V4), which is the read register of the instruction 
(14), is already set to "1 Therefore, the instruction (14) 
waits to be executed. In short when the execution of 
the instruction (12) is finished at dock cyde 3, the bit 

45 *1" corresponding to the register (r4) is reset to "0". 
After the bit is reset the instruction (14) is immediately 
executed at clock cycle 4. At this time, the instruction 
(14) is a multiple operation which requires three dock 
cycles. Therefore the bit corresponding to the register 

so (r7), which is the write register of the instruction (14), 
is set to "1*. 

When last two instructions (15, 16) are executed at 
dock cyde 5, the bit corresponding to the register (r7), 
which is a read register of the first instruction (I5), was 
55 already set to "1". Therefore, the instructions (15, 16) 
wait until the execution of the instruction (14) is 
finished. After the bit "1 " corresponding to the register 
(r7) is reset at dock cyde 6, the two instructions (I5, 
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the several subsets of the plural instructions and 
the decision result; and 

assignment means (1) for respectively 
assigning the subset of the plural instructions to 
corresponding types of the operation units 5 
according to the decision result stored in the 
cache memory (3). 
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(57) A computer for simultaneously executing 
plural instructions, A decision means deter- 
mines the types of operation and the possibility 
of simultaneous execution for the plural in- 
structions when they are read out from the main 
memory (1) to the cache memory (3). The result 
of this determination is called a decision result 
The plural instructions and decision result are 
stored in the cache memory (3). The decision 
process is performed for several subsets of the 
plural instructions read out from the main mem- 
ory (1) to the cache memory (3) in order. 

Then, the plural instructions are respectively 
assigned to corresponding operation units ac- 
cording to the decision result and executed. As 
a result of this arrangement, the repeated deci- 
sion process for the plural instructions need not 
be repeated each time the instructions are read 
out from the cache memory (3) to the operation 
unit K 
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